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We document our Fortran 77 code for multicanonical simulations of 4D U(l) lattice gauge theory in the neighborhood of its 
phase transition. This includes programs and routines for canonical simulations using biased Metropolis heatbath updating and 
overrelaxation, determination of multicanonical weights via a Wang-Landau recursion, and multicanonical simulations with fixed 
weights supplemented by overrelaxation sweeps. Measurements are performed for the action, Polyakov loops and some of their 
structure factors. Many features of the code transcend the particular application and are expected to be useful for other lattice gauge 
theory models as well as for systems in statistical physics. 

Program Summary 

Program title: STMCJJ1MUCA 

Program obtainable from: Temporarily from URL http : //www . hep . f su . edu/~berg/research . 
Distribution format: tar.gz 
Programming language: Fortran 77 

Computer: Any capable of compiling and executing Fortran code. 
Operating system: Any capable of compiling and executing Fortran code. 

Nature of problem: Efficient Markov chain Monte Carlo simulation of U(l) lattice gauge theory close to its phase transition. 

Measurements and analysis of the action per plaquette, the specific heat, Polyakov loops and their structure factors. 

Solution method: Multicanonical simulations with an initial Wang-Landau recursion to determine suitable weight factors. 

Reweighting to physical values using logarithmic coding and calculating jackknife error bars. 

Running time: The prepared tests runs took up to 74 minutes to execute on a 2GHz PC. 
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1. Introduction 

Continuum, quantum gauge theories are defined by their 
Lagrangian densities, which are functions of fields living in 
4D Minkowski space-time. In the path-integral representation 
physical observables are averages over possible field configu- 
rations weighted with an exponential factor depending on the 
action. By performing a Wick rotation to imaginary time the 
Minkowski metric becomes Euclidean. Discretization of this 
4D Euclidean space results in lattice gauge theory (LGT) - a 
regularization of the original continuum theory, which allows 
to address non-perturbative problems. For a textbook see, for 
instance, Ref . [ 1 ] . Physical results are recovered in the quantum 
continuum limit a — > 0, where a is the lattice spacing measured 
in units proportional to a diverging correlation length. 

U(l) pure gauge theory, originally introduced by Wilson 
is a simple 4D LGT. Nevertheless, determining its phase struc- 
ture beyond reasonable doubt has turned out to be a non-trivial 
computational task. One encounters a phase transition which is 
believed to be first-order on symmetric N A S lattices, e.g. |3l|3,|5t]. 
For a finite temperature N] x N t , N t < N s geometry the situ- 



ation is less clear: Either second-order for small N T and first- 
order for large N T |g|, or always second order, possibly corre- 
sponding to a novel renormalization group fixed point |01 . In 
3D U(l) gauge theory is confining for all values of the coupling 
on symmetric N] lattices QSL, [SB, while in the finite tempera- 
ture N 2 S x N T , N T < N s geometry a deconfining transition of 
Berezinsky-Kosterlitz-Thouless type ifToL [Till is expected, see 
111 211 for recent numerical studies. 

In lattice gauge theory one can evaluate Euclidean path in- 
tegrals stochastically by generating an ensemble of field con- 
figurations with Markov chain Monte Carlo (MCMC). In this 
paper we report MCMC techniques we used in [7]. They are 
based on multicanonical (MUCA) simulations B13l Il411 supple- 
mented by a Wang-Landau (WL) recursion [15], both employed 
in continuum formulations. For updating we use the biased 
Metropolis heatbath algorithm (BMHA) of 1 1 611 added by over- 
relaxation [17]. Observables include the specific heat, Polyakov 
loops and their structure factors (SFs) for low-lying momenta. 
For the analysis of these data binning is used to render auto- 
correlations negligible and a logarithmically coded reweighting 
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procedure calculates averages with jackknife error bars. 
Our program package 

STMCUlMUCA.tgz 

can be downloaded from the web at 



In our code we use Wilson's action 



|http : / / www . h ep . f su . edu/ ~ berg/research . 

Unfolding of STMCJJIMUCA . tgz with tar -zxvf creates 
a tree structure with root directory STMCJJIMUCA. Folders on 
the first level are ExampleRuns, ForProg and Libs. Besides 
in the subfolders of ExampleRuns, copies of all main programs 
are found in ForProg. Fortran functions and subroutines of our 
code are located in the subfolders of Libs, which are Fortran, 
Fortran_par, Ul, and Ul_par. Routines in Fortran and Ul 
are modular, so that they can be called in any Fortran program, 
while routines in the other two subfolders need user defined pa- 
rameter files, which they include at compilation. General pur- 
pose routines are in the Fortran subfolders and to a large ex- 
tent taken over from ForLib of 01411 . while routines specialized 
to U(l) are in the Ul folders. Parameter files are 

bmha.par, lat.par, lat.dat, mc.par (1) 

for canonical simulations and in addition 

sf.par, ulmuca.par (2) 

for SF measurements and MUCA simulations with WL re- 
cursion. The main programs and the routines of the subfold- 
ers Fortran_par and Ul_par include common blocks when 
needed. These common blocks with names common* . f are also 
located in the Fortran_par and Ul_par subfolders. 

This paper is organized as follows. In Sec.|2]we define U(l) 
LGT and introduce the routines for our BMHA and for mea- 
surements of some observables. Sec. [3] is devoted to our code 
for MUCA runs and to the analysis of these data. Sections [2] 
and[3]both finish with explicit example runs, where Sec.[3]uses 
on input action parameter estimates obtained in Sec. [2] A brief 
summary and conclusions are given in the final section @] 

2. Canonical simulations 

Our code is written for a variable dimension d and supposed 
to work for d > 2. However, its use has mainly been confined 
to d = 4 to which we restrict most of the subsequent discussion. 

After U(l) gauge theory is discretized its fundamental de- 
grees of freedom reside on the links of a 4D hypercubic lattice 
which we label by x, p: x is a 4D vector giving the location of 
a site, and p = 1,2,3,4 is the direction of the link originating 
from this site, p = 4 corresponds to the temporal direction of 
extension N T and p = 1, 2, 3 to the spatial directions of exten- 
sion N s . 

The system contains N 3 ^ x N T sites and x N T x4 degrees 
of freedom U x ,n mat belong to U(l) gauge group, which we 
parametrize by 



X fl=l V<[1 



(4) 



The product U X ^U X+ ^ V U+ + . ^U^ v is taken around a plaquette, 
an elementary square of the lattice. In 4D each link participates 
in 6 plaquettes (2(d - 1) in <f -dimension). Products such as 
Ux+fi,yU+ +t U+ v are called staples. 

In canonical simulations one generates an ensemble of con- 
figurations weighted with exp^S), the Boltzmann factor of a 
system with energy -S , which is in contact with a heatbath at 
inverse temperature Here /? is the inverse temperature of a 
statistical mechanics on the lattice and not the physical temper- 
ature of the LGT. The latter is given by the temporal extent of 
the lattice: T = l/(aN T ). 

An important property of the action is locality: For a 
link update only its interaction with a small set of neighbors 
is needed. The part of the action involving a link U Xyfl being 
updated (with all other links frozen) is: 




x+fi-v,v^ x-v,pt Ux-v,v) 



vtft 



(5) 



The sum in square brackets [...] runs over 6 staples and is eval- 
uated before updating the link. We denote it 

U u = aexp(/0 u ) (6) 
To simplify the notation we drop the x, p. subscripts of the link: 



S(U) ~ Re(UU u ). 

Thus the distribution 

P(U) ~ = g/SRe(E/£/ u ) 

needs to be sampled. In angular variables 

Re(UU u ) = aRe[e Ke+6u) ) = a cos(0 + 6 U ) 

and 

P(6)d6 ~ e ^ aco < e+e ^de = e^^dip, tp = 6 + 6 U . 
The final probability density function (PDF) is 

P(a-tp) ~ e ^ cos ^. 



(7) 



(8) 



(9) 



(10) 



It is straightforward to implement the Metropolis algorithm 1 1 8 ] 
for ([Tol l. A value ip„ eH , is proposed uniformly in the interval 
[0, 2jt) and then accepted with probability 



U X n = exp(*6U), 6U e [0,2*0 



(3) 
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P(a; (p id) 



(11) 
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Figure 1: The cumulative distribution function Fp(a = 3.9375; <p) for an arbi- 
trarily chosen value of the parameter a. 
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Figure 2: The cumulative distribution function Fq(o = 3.9375; <p) which serves 
as an approximation of Fp. Compare to Fig.fT] 



However, it has low acceptance rate in the region of interest 
(0.8 ^ B ^ 1.2). An efficient heatbath algorithm (HBA) is hard 
to design since the cumulative distribution function (CDF) 



F P (a; ip) 



£ P(a;ip')dip' 

X 2 >(a;^')#' 



(12) 



is not easily invertible, because it cannot be represented in terms 
of elementary functions. Nevertheless, two variations of heat- 
bath algorithms for U(l) do exist JH], Q. 

2.1. Biased Metropolis heatbath algorithm 

The MCMC updating of our code relies on a BMHA d, 
which approximates heatbath probabilities by tables as de- 
scribed in the following. The updating variable ip is drawn from 
some distribution Q(a; ip) and then accepted with probability 



min-! 1, 



P(a; (fnew) Q{a\ (fold) \ 
P{a; (p i d ) Q(a; ip„ ew ) ) ' 



(13) 



This turns out to be a special case of general acceptance prob- 
abilities introduced by Hastings 112 ill . One refers to to the pro- 
posals as biased when Q(a', <p id) I ' Q(ct', <fnew) + 1 holds. 

For the heatbath algorithm the proposal probability Q(a; <p) is 
chosen to be identical to the target distribution P(a; tp), so that 
<p new is always accepted. In practice it is sufficient that Q(a; ip) 
is a good approximation of P(a; ip). We approximate the CDF 
dT2b by a piece-wise linear function Fq(o; ip). Compare Fig. [T] 
and Fig. [2] We partition the Fq axis into n equidistant pieces 
(n = 16 in the figures), which defines tp l , ...,tp" values via the 
relation Fg(a; tpj) - F g(a; <p J_1 ) = 1 /n, and we call an interval 
(<p J , ip-*] a bin. The approximated PDF 



Q{a;<p) = 



dFr 



1 



1 



dip n(ipj — ipj l ) nAtpj 



(14) 



is flat in each bin and it is easy to generate tp from Q(a; ip): 
One first picks a bin j with uniform probability (1 /n) and then 
generates ip uniformly in this bin. 



We call this scheme BMHA because a Metropolis ac- 
cept/reject step is used and a bias is introduced to enhance the 
acceptance rate by making the proposal probability an approx- 
imation of the one of the heatbath algorithm. When the dis- 
cretization step goes to zero 



&<P J '"" _^ 1/P(a;<p„ ew ) _ P(a; (p oU ) 
A<pj°u l/P(a;tp i d ) ~ P(a;ip„ ew ) 



(15) 



follows from (fT4l) and the acceptance rate ( fT3l ) approaches 1 . 

P(a; ip) depends on an external parameter ^ a ^ a max = 
2 (d - 1) and the inverse temperature B. We discretize a for the 
proposal probability Q(a; ip). Before updating a link in the code 
we evaluate the sum of the staples U u and decompose it into the 
magnitude a and phase 9 U . At this stage a is then known. 

The following is a summary of the algorithm we use to gener- 
ate the ip variable with the PDF ( TTOb . These routines are located 
in Libs/Ul_par. 

2.1.1. Table generation 

1 . Choose m bins for the parameter axis a and n bins for the 
variable ip. Two mxn arrays are needed. We take m and n 
to be powers of 2, because n being a power of 2 speeds up 
finding j u ( f20b . though m can, in principle, be arbitrary. 

2. Calculate a discrete set of a values by 



1\ a„ 
(T = |i- - — ■ 

2/ n 



i — 1, m. 



3. For each a' evaluate 



/ >(a>')#' 



ff'P(a i ;<p>)d<p' 



(16) 



(17) 



by numerical integration with B - beta_table as set in 
mc . par. The inverse temperature of the canonical simula- 
tion is denoted beta in the code. We reserve the possibility 
to have different values of the inverse temperature for the 
BMHA table generation and for the simulation. Of course, 



3 



for beta_table = beta the table approximates the heat- 
bath distribution at beta. When a range of inverse temper- 
atures is used, as in MUCA simulations, one can tune the 
acceptance rate by playing with beta_table. The ranges 
that we used in multicanonical runs were narrow, so we 
were content with setting beta_table = bmax. 
4. Tabulate F'q(ip) by values if 1 '-' defined by 



./ 

and the differences 



C4)"'(3 



(18) 



(19) 



The common block common_bmhatab . f , which is listed below, 
stores the quantities ip u ' , Atp' rJ and In A<p' rJ , respectively: 

C Table for BMHA. 

common/tabbma/ tabbm(nbm2 ,nbml) , 
& delbm(nbm2 ,nbml) ,dbmln(nbm2,nbml) 

where the parameters nbml = m and nbm2 = n are set in the file 
bmha.par: 

c Biased Metropolis-HeatBath. (BMHA) parameters: 
parameter (nbml=2**5 , 
& n21og2=7,nbm2=2**n21og2) 

2.1.2. BMH updating 

Our implementation of BMH updates is given in the routine 
ul_bmha.f . A call to ul_bmha.f performs one sweep, which 
here is defined by updating each variable (U(l) matrix) once in 
sequential order. For each, single BMH update it calls the sub- 
routine ul_bmha_update . f . Its functions are shortly described 
in the following. 

1. After a and 6 U are calculated by the ul_getstaple sub- 
routine, determine the a bin k = lnt[a/a max ] + 1, where 
Int[;t] denotes rounding to the largest integer ^ x. 

2. For given k determine to which bin the previous value 
fold — (SoU+Su) mod 2n belongs (in the code 9 m value is 
stored in aphase array). This is done with a binary search 

joid -» joid + 2'' ■ signdp - ip u ° u ), i -» i - 1 (20) 



starting with j u = 0, i = log 2 n - 1 . 

3. Pick a new bin 

j new = Intfnri] + 1 , 

where r\ is a uniform random number in [0,1). 

4. Pick a new value 



■A<p<> 



"r 2 ■ 



(21) 



(22) 



where r2 is a uniform random number in [0,1). 

5. Accept (p new with probability (fT3l . 

6. If accepted, store 9 new = (2n + (p mw - 9 U ) mod 2n in the 
aphase array. 

For U(l) and SU(2) we found d that m = 32 and n = 
128 are large enough to achieve ~ 0.97 acceptance rate. Thus, 
the BMHA achieves practically heatbath efficiency. It becomes 
important for cases where a conventional HBA is difficult to 
implement and/or computationally inefficient. 



2.2. Overrelaxation 

We use overrelaxation (OR) to faster decorrelate the system. 
OR algorithms were introduced by Adler. See [22] and refer- 
ences given therein. In the formulation of Ref. 12311 the idea 
is to generate a new value of the link matrix that lies as far 
as possible away from the old value without changing the ac- 
tion too much. This is done by reflecting the old matrix about 
the link matrix, which maximizes the action locally. In U(l) 
LGT one reflects 9 u about the element Go, which maximizes 
the PDF (flOl i. The <p value that maximizes (TlOb is ipo = 9q + 9 U . 
Reflecting G u about 6q = —G u we find 



00 ~ (0oid - Go) = -26u - e ok 



(23) 



As 6„ ew d23l does not change the action, OR constitutes in our 
case a microcanonical update. Our implementation is the sub- 
routine ul_over . f , which performs one overrelaxation sweep. 
In the code 6 new = (6n - 29 u - o id) mod 2k. 

2.3. Example runs 

Short canonical simulations are needed to determine the ac- 
tion range for the multicanonical runs. We perform 1 BMHA 
sweep followed by 2 OR sweeps. Runs in 3D on 6 2 x 4 lattices 
are prepared in the subfolders 

C3D(94t<96xblp<9 and C3DQ4t86xb2p® 

of the folder ExampleRuns and in 4D on 6 3 x 4 lattices in 

C4D(84t<96xb<9p9 and C4D(54t<56xblpl . 

Parameters are set in the * .par and lat .dat files, the lattice 
size in lat . par and lat . dat: Run parameters in mc . par and 
the BMHA table size in bmha.par, which is kept the same for 
all runs. 

The general structure of our MCMC simulations is that out- 
lined in 11411 : Lattice set-up and initialization are done by the 
routine ul_init.f followed by nequi sweeps for equilibra- 
tion, which do not record measurements. Afterwards nrpt x 
rtmeas measurements are carried out, each after nsw sweeps (in 
H only nsw = 1). 

The 4D /3 values embrace the pseudo-transition region of the 
6 3 x 4 lattice: ft = 0.9 in the disordered and y8 = 1.1 in the 
ordered phase. To avoid divergence of the equilibration time 
with increasing lattice size, the start configuration has to match 
the phase: Ordered (i start = 1) in the ordered and disordered 
(i start = 2) in the disordered phase. 

The production program has to be compiled with 

../make77 ul_tsbmho.f 

where the make77 file is located one level up in the tree. 
Besides Fortran 77 compilation the make77 moves parameter 
files around, so that they are properly included by subroutines, 
which need them. The Fortran compiler defined in our make77 
is g77. You may have to replace it with the one used on your 
computing platform. In the tcshell CPU times are recorded by 
running the a . out executable with 

../CPUtime > & fln.txt & 
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in the background, where the file CPUtime is also located one 
level up in the tree. While running one can monitor the progress 
by looking up the progress. d file. In test runs on a Intel 
E5405 2 GHz quad-core PC using gll version 3.4.6 (Red Hat 
3.4.6-4) the execution times for the prepared simulations were 
3m35s in 4D and 41s in 3D. 

After a production run ana_tslul . f is used to analyze the 
action data. In the present context we only need the mean action 
values as input for the action ranges set in our multicanonical 
runs. With gnuplot hOl.plt or iOl.plt one plots the ac- 
tion histogram. The BMHA acceptance rate is also returned 
by ana_tslul . f . Further, the integrated autocorrelation length 
Tim is estimated, but the statistics may not always be sufficiently 
large for a good determination (the statistics needed to estimate 
the average action is much smaller lfl4lo . The relevant param- 
eters for calculating Tj nt have to be set in the analysis program 
and a plot is obtained with gnuplot altau . pit. 

As rounding errors in floating point operations depend on the 
compiler, the actual numbers obtained for the average action 
on different computing platforms may vary within statistical er- 
rors. In our 4D run they were 



actm/np = 0.46942 (18) at p = 0.9 , 
actm/np = 0.71728(14) at p = 1.1, 



(24) 
(25) 



where n p is the number of plaquettes. Approximated, this range 
across the phase transition will be used in the ulmuca . par file 
of our subsequent MUCA simulation: 

actmin = 5mi n = actm at p min = 0.9 , (26) 
actraax = S max = actm at p max — 1.1 . (27) 

The procedure for running 3D examples is the same as for 
the 4D runs described above. To cover the pseudo-transition 
region on 6 2 x 4 lattice we set the inverse temperatures for the 
canonical runs at/? = 1.0 in the disordered and p = 2.0 in the 
ordered phase. The choice of a broader range of temperatures 
is appropriate, related to finite size effects that scale logarith- 
mically with the lattice size. Average action values obtained 



actm/np = 0.47495(27) at p= 1.0, 
actm/np = 0.81079(15) at /? = 2.0 . 

3. Multicanonical simulations 



(28) 
(29) 



In MUCA [13] simulations one samples the phase space with 
weights, which are a working estimate 031 of the inverse spec- 
tral density up to an overall factor. This makes the energy his- 
togram approximately flat and allows for efficient reconstruc- 
tion of canonical expectation values in a range of tempera- 
tures (Pmin,Pmax)- This has many applications and is especially 
useful when studying the vicinity of first-order phase transi- 
tions, where canonical histograms exhibit a double-peak struc- 
ture, suppressing tunneling between phases even for relatively 
small-sized system (e.g., see [24] for a recent study of 3D Potts 
models). For second-order phase transitions the usefulness of 



MUCA simulations has been discussed in the context of clus- 
ter algorithms [25]. Most important and successful applications 
target complex systems like, for instance, biomolecules 02611 . 

The MUCA method consists of two parts 01411 : A recursion 
to determine the weights and a simulation with frozen weights. 
For the first part we have programmed a continuous version of 
the WL recursion II 1 511 . 



3.1. Wang-Landau recursion 

Because the WL code of this section is programmed for 
generic use in statistical physics systems we use the energy E 
instead of the action notation. In earlier recursions for MUCA 
parameters (see, e.g., lfl4fl ) one was iterating with a weight 
function of the energy, w(E), inversely proportional to the num- 
ber of histogram entries at E. In contrast to that the WL al- 
gorithm 01511 increments the weight multiplicatively, i.e., addi- 
tively in logarithmic coding: 



In w(E") — > In w(E ) - awL > o-wl > 



(30) 



at every WL update attempt E — > E', where addwl = awi in 
our code. Here E — E when the update is rejected and E — E' 
when the update is accepted. After a sufficiently flat histogram 
is sampled (we come back to this point), the WL parameter is 
refined, 

owl - > owl/2 ■ (31) 

In its original version the WL algorithm deals with discrete sys- 
tems like Ising or Potts models for which histogram entries cor- 
respond naturally to the discrete values of the energy. How- 
ever, for continuous systems binsizes are free parameters and 
we have to deal with their tuning. 

We discretize the U( 1 ) action range into bins of a size large 
enough in AS , so that one update cannot jump over a bin. Here 
AS < 4 (d — 1), because there are 2 (d - 1) plaquettes attached 
to a link and the range of the plaquette action is in the inter- 
val [-1, +1], which gives another factor 2. In the program this 
value AS is defined as delE. Next, we are not using constant 
weights over each bin, but instead a constant temperature inter- 
polation as already suggested in lfl3ll . 

One WL update has two parts: 

1 . A MUCA update of the energy (in our U( 1 ) code action) 
using the weights at hand. 

2. A WL update (p30) of the MUCA weights. 



In our code a WL sweep is done by a call to 



uljnucabmha.f , 



(32) 



which updates link variables in sequential order through calls 
to the included routine ul_update_mubmha.f . This routine 
generalizes BMH updating to the situation of MUCA weights, 
which is relatively straightforward (see the code for details) and 
increases efficiency compared to MUCA Metropolis simula- 
tions by a factor 3 to 5. It calls three modularly coded routines: 
The functions fmucaln.f and betax.f to calculate weights 
and p values as needed for the BMHA. After an action update 
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is done, the WL update ( f30b is performed by a call to the sub- 
routine wala_updt . f , which is at the heart of our modifications 
of the WL algorithm. 

The basic point is that wala_updt . f does not only iterate 
the number of histogram entries, but also the mean value within 
each histogram bin. The relevant lines of that code are listed 
below. 

C Put addwl <= zero in 2. part of MUCA. 
if (addwl . gt . zero) then 

win ( ix) =wln ( ix) -addwl 

xwl ( ix) = (hx ( ix) *xwl ( ix) +x) 
endif 

hx ( ix ) =hx ( ix ) +one 
hxO(ix)=hxO(ix)+one 
if (addwl . gt . zero) then 

xwl ( ix) =xwl ( ix) /hx ( ix) 
else 

xmu ( ix) =xmu ( ix) +x 
endif 

Besides performing the update (130b the routine tracks the his- 
togram entries in the array hx and for addwl > the mean value 
of x within bin ix as xwl(ix). For addwl < IS MUCA sim- 
ulations with fixed weights are performed. Then the xwl(ix) 
values are kept constant, but the array xmu allows one to calcu- 
late at a later stage the average within a histogram bin. 

The xwl(ix) values are essential entries for the Fortran 
functions fmuca.f and betax.f. Logarithmic WL weights 
wln(ix) correspond to the mean value positions. For general 
x values the function fmucaln.f interpolates the logarithmic 
weights linearly from the wln(ix) weights of the two neigh- 
boring mean values: 

x= (E-Emin) / delE 
ix=l+int (x) 

if (x . gt . xwl (1) . and. x . It . xwl (ixmax) ) then 
if (x. gt . xwl (ix) ) then 

ixl=ix+l 
else 

ixl=ix-l 
endif 

wl=abs (xwl (ixl) -x) 
w2=abs (xwl (ix) -x) 

fmucaln=(wl*wln(ix)+w2*wln(ixl))/(wl+w2) 
elseif (x . le . xwl (1) ) then ... 

With this input the function betax calculates the ft values used 
by the BMHA: 

if (x . le . xwl (ix) ) then 
if(ix.eq.l) then 

betax=bmax 
else 

betax=(wln(ix-l) -wln(ix) ) / 
& (xwl(ix)-xwl(ix-l))/delE 
endif 
else 

if (ix. eq. ixmax) then 



betax=bmin 
else 

betax=(wln(ix) -wln(ix+l) ) / 
& (xwl(ix+l)-xwl(ix))/delE 
endif 
endif 

Although these routines are modular, to transfer the relevant 
arrays and variables into the U(l) code, they are all kept in the 
common block common_ulmuca. f . 



C win MUCA logarithmic weights (w=exp(wln) ) . 

C hx Total count of histogram entries . 

C hxO For reconstruction of entries during 

C one recursion segment. 

C xwl Continuously updated mean values of 

C histogram bins 

C (used with MUCA weights) . 

C xmu Keeps track of mean values of 

C histogram bins during 

C fixed weights MUCA runs. 

C addwl Wang-Landau parameter. 

C flat Flatness of the histogram as measured 

C by hist_flat.f . 

C irupl Start of WL recursion loop. 

C irec Number of WL recursions done . 

C mucarun Number of MUCA run 

C (0 for WL recursion, then 1, 2, ...). 

C ntun Number of tunneling (cycling) events. 

C ltunO Logical used when incrementing ntun. 



common/wln/wln(ixmax) ,hx(ixmax) , 
& hxO(ixmax) ,xwl(ixmax) , addwl , xmu (ixmax) , 
& flat , irupl , irec , mucarun, ntun, ltunO 

The common block is on the specialized U(l) level, because 
the array dimension 

ixmax = Int [(actmax - actmin) (np/(4(nd- 1)))] 

depends on the number np of plaquettes of the U(l) lattice. The 
relevant parameters are set by the user in ulmuca . par. 

Once the system cycled from the minimum (|26*I > to the maxi- 
mum (|27] | action value and back 

actmin < — > actmax , (33) 

a WL recursion OTb is attempted. The cycling or tunneling con- 
dition ( T33l > ensures that the range of interest has indeed been 
covered. In addition we require that the sampled action his- 
togram is sufficiently flat. The flatness is defined by 

flatness = h mia /h m . dX , (34) 

where /z m j n and /z max are the smallest and largest numbers of 
histogram entries in the range of interest, and are calculated by 
our modular routine hist_f lat . f . Our cut on this quantity 
is set by f latcut in ulmuca. par. In our simulations J3l we 
used f latcut = 0.5. This is rather weak compared to the re- 
quirement in the original WL approach [15] that "the histogram 
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H(E) for all possible E is not less that 80% of the average 
histogram (H(E))", although their definition of flatness is less 
stringent than our Eq. d34l >. The conceptual difference is that the 
WL paper aims at iterating all the way towards an accurate esti- 
mate of the inverse spectral density, while we are content with a 
working estimate, which enables cycling (f33T >. The estimate of 
the spectral density is then postponed to our continuation with 
frozen weight for which convergence of the MCMC process is 
rigorously proven [14], whereas no such proof exists for the 
WL algorithm. 

It needs to be mentioned that the histogram hx is accumu- 
lated over the entire recursion process and the same is true for 
the refinement of the weights win. The histogram hxO keeps 
track of the entries between WL recursions. Presently this in- 
formation is not used in the code. One may consider to apply a 
flatness criterion to hxO instead of hx. This is just one of many 
fine tuning options, which we did not explore, because the WL 
recursion in its present form took just a few percent of the CPU 
time in our U(l) simulations [Q]. 

The desired number of WL recursions (|3TT i is set by the 
parameter nrec of ulmuca.par, typically nrec = 20 or 
somewhat larger. To achieve this, nrup (number (n) recur- 
sions (r) upper (up) limit) WL recursion attempts are allowed, 
each accommodating up to nupdt WL update sweeps. The 
update sweeps are interrupted for a recursion attempt when 
cycling (l33l is recorded by our modularly coded subroutine 
tuna_cnt . f , which checks after every sweep. So, nrup x 
nupdt is the maximum number of sweeps spent on the WL 
recursion part. The process is aborted if the given limit is ex- 
hausted before nrec WL recursions are completed. 

3.2. Fixed weights MUCA simulations and measurements 

Fixed weight MUCA simulations are performed by the rou- 
tines discussed in the previous section, only that the WL up- 
date ( f30b is no longer performed, what is programmed to be the 
case for addwl < 0. We still perform updates of the weights 
in-between (long) MUCA production runs, which is done by a 
call to ulmu_recl . f in the initialization routine ulmu_init . f . 

Overall our simulation consists of 

nequi + nrpt x nmeas x nsw 

BMHA sweeps, each (optionally) supplemented by 2 overre- 
laxation sweeps. During a MUCA simulation several physical 
quantities (described below in the section on data analysis) are 
measured by the ulsf .measure subroutine. 

Measurements are performed every nsw sweeps and accumu- 
lated in the arrays defined in common_ul . f : 



c tsa: Time series action array. 

c a_min,a_max: Action minimum and maximum for 

c a series of sweeps. 

c tsws,tswt: Time series for lattice 

c average spacelike and timelike 

c Wilson plaquette loops. 

c plreal,plimag: Space arrays for Polyakov 

c loops in (nd-1) dimensions. 

c tspr,tspi: Time series for lattice 

c average Polyakov loops. 

c isublat: sublattice for Polyakov loops 

c (in t=0 slice) . 

common /ul/ aphase (nd,ms) , act , amin , amax , 
& acpt ,tsa(nmeas) ,a_min,a_max,tsws(nmeas) , 
& tswt (nmeas) ,plreal(mxs) ,plimag(mxs) , 
& tspr (nmeas) , tspi (nmeas) , isublat (mxs) 

and in common_sf . f : 

C Arrays for structure function measurements, 
common/ulsf/ tssf (ntotalsf box, nmeas) , 
& nsf comp(l : ntotalsf box, :ndimsf ) 

Then each nmeas x nsw sweeps (i.e., on each iteration of the 
nrpt loop) the arrays with measurements are written on disk by 
the ulmu_rw_meas subroutine, which can also read data. With a 
call to ulwl_backup . f the current state of the lattice is backed 
up on disk. This allows one to restart the program from the 
latest iteration of the nrpt loop if it gets interrupted and ensures 
that not more than 1/nrpt of the total running time is lost in 
such a case. Typically, we set nrpt=32. 

3.3. Data analysis 

The program ana_ulmu . f calculates action, specific heat and 
two Polyakov loop susceptibilities, the program sf ana_ulmu . f 
Polyakov loop structure factors. Definitions are given in the 
following with names of corresponding gnuplot driver files in 
parenthesis. The specific heat (plot_C . pit) is 



COS) - 7rf<S 2 >-<S> 2 ] with N p = 



d(d-l) , 

„ N 3 ,N T (35) 



where S is the action (plot_a. pit). Besides the action we 
measure Polyakov loops and their low-momentum structure 
factors. Polyakov loops Pg are the C/y products along the 
straight lines in N T direction. For U(l) LGT each P% is a com- 
plex number on the unit circle and xlabels the sites of the spatial 
sublattice. From the sum over all Polyakov loops 



(36) 



c aphase (nd,ms) : Phase of the U(l) "matrix". 

c (We store aphase and 

c the matrix on the link is 

c e~{i aphase}.) 

c act: Energy (action) variable. 

c amin, amax: Action act minimum and 

c maximum of the MC sweep. 

c acpt: Counts accepted updates. 



we find the susceptibility of the absolute value \P\ 

X™M = -Jj [0P| 2 > - (\P\f ] (37) 
(plot_CP . pit), and the susceptibility with respect to p 



Zax05) = i4<|f|> 

N] dp 



(38) 



(plot_CPb.plt). The analogues in spin system are the mag- 
netic susceptibilities with respect to an external magnetic field 
and with respect to the temperature. 

3.3.1. Structure factors 

Structure factors are defined by 



F(k) 



1 



^ P(x) exp(ikx) 



In 
— i 



(39) 



where « is an integer vector to which we refer as momentum in 
the following (it differs from the structure factor momentum k 
by a constant prefactor) to describe how we average over differ- 
ent momenta. 
In sf . par 

C (0,0,1), (0,1,0), (1,0,0), and so on, 
C are stored separately. 
C ndimsf : number of dimensions for 
C the structure function. 

C nsf max : maximum value of 
C \vec{n}=(nl,n2,n3, . . . ,n_ndimsf ) 

C with =< nl,n2, . .n_ndimsf =< nsf max. 

C nsf box: total number of \vec{n} components, 
parameter (ndimsf =nd-l ,nsfmax=l , 
& nsfbox=(nsfmax+l)**ndimsf ) 

the dimension ndimsf of the sublattice on which SFs are cal- 
culated is defined and one sets the maximum value of the com- 
ponents of the momentum, nsf max, 

n = («1,«2, -,«ndimsf), (40) 

Hi — 0, nsf max, i — 1, ndimsf . (41) 
As rii counting is 0-based we measure during the simulation 



ntotalsf box = (nsf max + l) n 



(42) 



SF components. Their momenta are stored in the nsf comp ar- 
ray. In the example of a 4D multicanonical run given in the next 
section ntotalsf box=(l + 1) 3 =8. Initialization of the arrays 
for structure factor measurements is carried out by the routine 
sf _box_init, which also outputs how momenta are initialized 
and numbered: 

sf_box_init: Structure factors initialized: 

ndimsf = 3 

nsfmax = 1 

nsfbox = 8 
Integer vectors generated: 



# 


n~2 


components . . 




1 














2 


1 








1 


3 


1 





1 





4 


2 





1 


1 


5 


1 


1 








6 


2 


1 





1 


7 


2 


1 


1 





8 


3 


1 


1 


1 



These eight SFs are measured during this simulation. For a 
spatially symmetric lattice SF components with permuted mo- 
menta, i.e. (0,0,1), (0,1,0) and (1,0,0) are equivalent and 
there are only 

(nsfmax + ndimsf)! 

nsf dif f = (43) 

nsf maxlndimsf ! 

different modes. In the example nsf dif f =4 and they are 

(0,0,0), (0,0,1), (0,1,1) and (1,1,1). 

To average SFs over permutations of momenta one needs to 
identify momenta that differ up to permutations, calculate their 
multiplicity (i.e., the number of permutations) and construct a 
mapping from all momenta to the set of non-equivalent mo- 
menta. For this purpose the sf _box_shuf f le . f subroutine is 
used. It returns three arrays corresponding to the elements de- 
scribed above: nsf comp_dif f , nsf mult i, nsf mapping. Us- 
ing them the analysis program sfana_ulmu.f averages SF 
components and outputs them in files prefixed with the non- 
equivalent momenta components (for instance, the SF with 
it = (0,1,1) is output in Ollsf 006x004tmu01 .d). The SF 
normalization in ( |39l is defined so that F(k) = 1 at f3 — for all 
momenta and dimensions. The output of sf _box_shuf f le . f 
from our example run is: 

sf _box_shuf f le : 

Different SF components (integer vectors) : 



# multi 


n~2 


components . 






1 


1 
















2 


3 


1 










1 


3 


3 


2 





1 




1 


4 


1 


3 


1 


1 




1 


Mappin 


g of 


the components 


(888 separator) 


1 





88 


8 1 8E 


58 








2 





1 88 


8 2 8£ 


58 





1 


3 


1 


88 


8 2 8E 


58 





1 


4 


1 


1 88 


8 3 8£ 


58 


1 


1 


5 1 





88 


8 2 8E 


>8 





1 


6 1 





1 88 


8 3 8E 


>8 


1 


1 


7 1 


1 


88 


8 3 8E 


>8 


1 


1 


8 1 


1 


1 88 


8 4 8E 


>8 1 


1 


1 



sf _box_shuf f le done. 

The first part of the output shows that four different SF compo- 
nents were identified and in the second part the mapping from 
the eight original momenta is shown. 

If only partial measurements are available, one can choose 
the parameter nset in ana_ulmu.f or sf ana_ulmu.f, which 
is preset to nset = nrpt, smaller than nrpt. 

3.3.2. Reweighting to the canonical ensemble 

The analysis programs are reweighting to the canonical 
ensemble. The simulation is performed with exp(wln(£)) 
weights, which need to be replaced by the Boltzmann factor 
exp(-fJE). Given a set of N multicanonical configurations the 
estimator for an observable O in the canonical ensemble is 



sf_box_init done. 



(0)(J3) = 



££i exp( -fiEf - wln(£,)) 



(44) 



Eq. (04| involves large terms in the numerator and denominator 
that can cause an overflow. To avoid this we use logarithmic 
coding as described in 11411 . Instead of adding two numbers one 
expresses the logarithm of the sum through their logarithms. 
With this strategy one effectively evaluates the logarithm of the 
numerator and denominator, which are of the same order, and 
exponentiates the difference. 

The ul_ts_zln.f subroutine performs the reweighting of 
the time series to a given value of /3 according to Eq. d44l) . Since 
the reweighting procedure is non-linear, one expects a bias, 
which is for nrpt patches proportional to Ti nt /(nmeas * nsw). 
Using jackknife error bars the bias becomes reduced by a factor 
l/(nrpt-l). This is realized by the ul_ts_zlnj . f subroutine. 
If one is not yet satisfied, one can go on and use the jackknife 
approach to estimate the bias explicitly. 



1. MUCA run 
H 2. MUCA run 

i l ' 

! i 

1 i 
I i 





0.9 



0.95 



1.05 



3.4. Example runs 

We have prepared MUCA example runs in the subfolders 

MUCA3D04t06xb2p0 and MUCA4D04t06xblpl 

of the folder ExampleRuns. The values of actmin and actmax 
in ulmuca.par are estimates from the previously discussed 
short canonical runs. The last four characters in the subfolder 
names denote the value of beta_table for which the BMHA 
table is calculated. 

The * . d test files left in these subfolders were obtained 
from the analysis of MUCA data obtained by the preset runs. 
The MUCA data themselves are produced in * . D files, which 
have been removed, because they are unformatted and read- 
ability is only guaranteed on the platform on which they are 
produced. The MUCA data production goes through three 
steps of individual runs. First one has to compile and run 
the program ulwl_bmho . f , which uses our WL recursion to 
obtain a working estimate of the MUCA weights. Subse- 
quently two runs of MUCA data production are performed by 
the program ulmu_bmho . f . After each data production step 
one may execute the data analysis programs ana_ulmu . f and 
sf ana_ulmu.f . 

In our examples the WL recursion is considered complete af- 
ter 20 successful recursion steps PTT ). In 3D this was achieved 
after 22 cycling (tunneling) events. In 4D 23 cycling events 
were needed. Then, during the simulation with fixed weights, 
more than 1 000 tunnelings per job were recorded in 3D. In 
4D 214 tunnelings occurred in the first and 247 in the second 
MUCA run. These numbers vary across different platforms. 
The results of the analysis programs are shown in Fig.[3]for the 
specific heat, Fig. [4] and [5] for the susceptibilities and in Fig. [6] 
for the first three non-trivial structure factors. On our 2 GHz PC 
the data production took 74m per job. Before, the WL recursion 
completed in just 2m27s. 

In 3D the specific heat does not diverge and the transition 
is much broader. We show in Fig. [7] the Polyakov susceptibil- 
ity. On our PC the WL recursion completed in 6s and the data 
production took 7m3s per job. 
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Figure 3: Specific heat on a 6 3 4 lattice. 
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Figure 4: Polyakov loop susceptibility on a 6 4 lattice. 
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Figure 5: Polyakov loop susceptibility with respect to fi on a 6 3 4 lattice 
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Figure 6: Structure factors on a 6 3 4 lattice (1. and 2. MUCA runs). 
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Figure 7: Polyakov loop susceptibility on a 6 2 4 lattice. 

4. Summary and Conclusions 

We think that the open source Fortran code documented in 
this paper can be modified for many applications in statistical 
physics and LGT, considerably beyond the U(l) gauge group. 
A number of parameters can be varied, but one should have 
in mind that most of them have not been tested. Obviously, 
it is in the responsibility of the user to perform rigid tests and 
verifications before trusting any of the result. 
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